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Abstract 

We study greedy-type algorithms such that at a greedy step we 
pick several dictionary elements contrary to a single dictionary ele- 
ment in standard greedy-type algorithms. We call such greedy algo- 
rithms super greedy algorithms. The idea of picking several elements 
at a greedy step of the algorithm is not new. Recently, we observed 
the following new phenomenon. For incoherent dictionaries these new 
type of algorithms (super greedy algorithms) provide the same (in the 
sense of order) upper bound for the error as their analogues from the 
standard greedy algorithms. The super greedy algorithms are com- 
putationally simpler than their analogues from the standard greedy 
algorithms. We continue to study this phenomenon. 

Keywords: super greedy algorithms, thresholding, convergence rate, inco- 
herent dictionary 

1 Introduction. Weak Super Greedy Algo- 
rithm 

This paper is a follow up to the paper [3] . We continue to study greedy-type 
algorithms such that at a greedy step we pick several dictionary elements 
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contrary to a single dictionary element in standard greedy-type algorithms. 
We call such greedy algorithms super greedy algorithms. We refer the reader 
to [5] for a survey of the theory of greedy approximation. The idea of picking 
several elements at a greedy step of the algorithm is not new. It was used, for 
instance, in [9]. A new phenomenon that we observed in [3] is the following. 
For incoherent dictionaries these new type of algorithms (super greedy algo- 
rithms) provide the same (in the sense of order) upper bound for the error as 
their analogues from the standard greedy algorithms. The super greedy algo- 
rithms are computationally simpler than their analogues from the standard 
greedy algorithms. We continue to study this phenomenon here. We note 
that the idea of applying super greedy algorithm to incoherent dictionaries 
was used in [2] for building an efficient learning algorithm. 

We recall some notations and definitions from the theory of greedy al- 
gorithms. Let if be a real Hilbert space with an inner product (•, •) and 
the norm := (o^x) 1 / 2 for all x E H. We say a set T> of functions (ele- 
ments) from if is a dictionary if each g G T> has a unit norm (||g|| = 1) and 
spanD = H. Let 

M{V) := sup |(^,^)| 

<p,xl>EX> 

be the coherence parameter of dictionary T>. We say that a dictionary T> is 
M-coherent if M(T>) < M. Main results of this paper concern performance 
of super greedy algorithms with regard to M-coherent dictionaries. We study 
two versions of super greedy algorithms: the Weak Super Greedy Algorithm 
and the Weak Orthogonal Super Greedy Algorithm with Thresholding. We 
now proceed to the definitions of these algorithms and to the formulations 
of main results. 

Let a natural number s and a weakness sequence r := {tk\kL\, tk £ 
[0, 1], be given. Consider the following Weak Super Greedy Algorithm with 
parameter s. 

WSGA(s,r). Initialization: f := /g' r := /. Then for each m > 1 we 
inductively define: 

(1) f( m -i) s +i, . . • , (p ms £ T> are elements of the dictionary V satisfying the 
following inequality. Denote I m :— [(m — l)s + 1, ms) and assume that 

min|(/ m _i,<pi)| > t m sup |(/ m _i,p)|. 
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(2) Let F m := F m (f m _i) := span(y2j,2 G J m ) and let Pp m denote an oper- 
ator of orthogonal projection onto F m . Define the residual after mth 
iteration of the algorithm 

fm ■ = fm ■ = fm—l PF m \fm—l)- 

(3) Find the approximant 

m 
3=1 

In the case tk — t, k — 1, 2, . . ., we write t instead of r in the notations. If 
t = 1, we call the WSGA(s, 1) the Super Greedy Algorithm with parameter 
s (SGA(s)). For s = 1 the Super Greedy Algorithm coincides with the Pure 
Greedy Algorithm and the Weak Super Greedy Algorithm coincides with the 
Weak Greedy Algorithm (see [5]). 

For a general dictionary T> we define the class of functions (elements) 

A\{V,B) := <feH:f = J2ck9k, 9k e V, \M < oo, ^|c fc |<5i 
I fceA fcGA J 

and we define Ai(V,B) to be the closure (in H) of A\(V,B). For the case 
B — 1, we denote Ai(X>) := Ai(T>, 1). We define the norm to be the 

smallest -B such that / G Ai(V, B). 

The following open problem (see [10], p. 65, Open Problem 3.1) on the rate 
of convergence of the PGA for the Ai(T>) is a central theoretical problem in 
greedy approximation in Hilbert spaces. 

Open problem. Find the order of decay of the sequence 

7 (m):= sup (||/ - G m (f, V) \\ |/|"| >), 

where the supremum is taken over all dictionaries £>, all elements / G A\(V)\ 
{0} and all possible choices of {G m }. 

We refer the reader to (5] for a discussion of this open problem. Introduce 
the following generalization of the quantity 7 (to) to the case of the Weak 
Greedy Algorithms 

7 (m,r):= sup (||/ - G^(f, V)\\\f\^). 
We prove here the following theorem. 
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Theorem 1.1. LetT> be a dictionary with coherence parameter M := M(T>). 
Then, for s < (2M)^ 1 , the WSGA(s,t) provides, after m iterations, an 
approximation of f G A\ (T>) with the following upper bound on the error: 

\\f-G${f,V)\\<Ct- 1 8-i<y{m-l,rt), 
i 

where r := 2 and C is an absolute constant. 

Theorem 11.11 with t = 1 gives the following assertion for the SGA(s). 

Corollary 1.2. LetV be a dictionary with coherence parameter M := M{T>). 
Then, for s < (2M) _1 ; the SGA(s) provides, after m iterations, an approxi- 
mation of f £ A\ (T>) with the following upper bound on the error: 

\\f-G s m (f,V)\\<Cs~h(m,r), 
where r := ( ]~¥ r s ) 2 and C is an absolute constant. 

\ l+Ms I 

It is interesting to note that even in the case of the SGA(s), when the 
weakness parameter is 1, we have the upper bound of the error in terms of 
7(m,r) not in terms of / ~f(m). For estimating 7(m,r) we use the following 
known result from jl]. 

Theorem 1.3. Let T> be an arbitrary dictionary in H . Assume r := 
is a nonincreasing sequence. Then, for f G A\{D,B) we have 

\\f-G^(f,v)\\<B(i + j2ttr tm/2{2+tm) . (i.i) 

k=l 

For a particular case — 1, k — 1,2, ... , this theorem gives the following 
result (see j8]). For each / £ A\(T>, B), the PGA provides, after m iterations, 
an approximant satisfying 

\\f-G m {f,V)\\ <Brn x l\ 

For / G Ai(T>), we apply the PGA and the SGA(s). Then after sm 
iterations of the PGA and m iterations of the SGA(s), both algorithms pro- 
vide sm-term approximants. For illustration purposes, take s = N 1 ^ 2 and 
m = N 1 ' 2 . Then the PGA gives 

\\f N \\ < (sm)- 1 / 6 = N-V* 
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and the SGA(s) provides 

||/m|| < Cs~ 1/2 m~^) = N-~^\ 

where 6 := 2 (2+r) an< ^ r = ( i+m'I ) > \ + \® — \- Thus, in this particular 
case, the SGA(s) has a better upper bound for the error than the PGA. 



2 Weak Orthogonal Super Greedy Algorithm 
with Thresholding 

In [3] we considered the following algorithm. Let a natural number s and a 
weakness sequence r := {tk}kLi, tk G [0, 1], be given. Consider the following 
Weak Orthogonal Super Greedy Algorithm with parameter s. 

WOSGA(s,r). Initially, f := f. Then, for each m > 1 we inductively 
define: 

(1) <£>( m _i) s+ i, . . . , (p ms G V are elements of the dictionary V satisfying 
the following inequality. Denote I m := [(m — l)s + 1, ms] and assume that 

min|(/ m _i,(^j}| > t m sup \(fm-i,g)\- 

(2) Let H m := H m (f) := span(<^i, . . . , (p ma ) and let Pjt m denote an oper- 
ator of orthogonal projection onto H m . Define 

G m (f) := G m {f,V) := G' m {f,V) := P H Jf)- 

(3) Define the residual after mth iteration of the algorithm 

fm '■= fm := f~ G m (f, V). 

In [3] we proved the following error bound for the WOSGA(s, t). 

Theorem 2.1. LetT> be a dictionary with coherence parameter M := M(T>). 
Then, for s < (2M) _1 ; the WOSGA(s,t) provides, after m iterations, an 
approximation of f e A\ {T>) with the following upper bound on the error: 

||/m|| 2 < A(t)(sm)-\ m = 1, 2, . . . A(t) := (81/8)(1 + t) 2 t" 4 . 



5 



In this paper we modify the WOSGA in the following way: we replace 
the greedy step (1) by the thresholding step. Here is the definition of the 
new algorithm. Let s be a natural number and let a weakness sequence 
r := {t k }^ =1 , t k e [0, 1] be given. 

WOSGAT(s,r). Initially /„ := f^ T := /. Then for each m > 1 we 
inductively define: 

(1) ^eD where i m G I m are elements of the dictionary T> satisfying the 
following inequalities s m := \I m \ < s and 

min \{fm-i,tpi)\ > t m \\fm-i\\ 2 - (2.1) 

I Sim 

(2) Let H m := H m (f) := span(y?j, i 6 I x U . . . U J m ) and let P# m denote an 
operator of orthogonal projection onto H m . Denote 

G m (f):=G s ^(f,V) :=P H M)- 

(3) Define the residual after mth iteration of the algorithm 

fm ■= fm ■= f - Pu m {f)- 

For s = 1 the WOSGA coincides with the Weak Orthogonal Greedy 
Algorithm (WOGA) and the WOSGAT coincides with the Modified Weak 
Orthogonal Greedy Algorithm (MWOGA) (see [TO], p. 61). We note that 
we can run the WOSGA and the WOGA for any / e H. It is proved in [10] 
that we can run the MWOGA for / e Ai(T>). In the same way one can prove 
that we can run the WOSGAT for / e A^V). We note that in step (1) of 
the WOSGAT, if there are more than s (p^s satisfying 

\(fm-li<Pi)\ > tm\\fm-l IP) 

the algorithm may pick any s of them and then make the projection. 

If t m — t for m = 1,2,..., we use t instead of r in the notation. We 
will prove an upper bound for the rate of convergence of the WOSGAT for 
/ G A\ (V) for a more general dictionary than the M-coherent dictionary. 

Definition 2.1. We say that a dictionary T> is (N, /3)-Bessel if for any iV 
distinct elements ipi, . . . , ip^ of the dictionary T> we have for any / e H 

N 

8=1 

where ^(N) := span-f^, . . . , ipx}. 
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Theorem 2.2. Let V be an (N, j3)-Bessel dictionary. Then, for s < N, the 
WOSGAT(s,t) provides, after m iterations, an approximation of f e Ai(T>) 
with the following upper bound of the error: 

m 

\\f-G m (f)\\<(i + (3t 2 J2 s jr 1/2 - 

i=i 

We point out that ~Y^j=i s j * s the number of elements that the algorithm 
picked up from T> after m iterations. Therefore, the WOSGAT offers the 
same error bound (in the sense of order) in terms of the number of elements 
of the dictionary, used in the approximant, as the WOGA or the MWOGA. 

We now give some sufficient conditions for a dictionary V to be (N, (3)- 
Bessel. We begin with a simple lemma useful in that regard. 

Lemma 2.1. Let a dictionary V have the following property of (N,A)- 
stability. For any N distinct elements ipi, . . . ,ipN of the dictionary V, we 
have for any coefficients c±, . . . , cjv 

N N 

1=1 1=1 

Then V is (N, A~ r )-Bessel. 
Proof. Let / G H. We have 

\\p MN) {f)\\= sup \{P nN) {f)M 

V6¥(JV),|M|<1 

N 

sup \^2(f,lpi)Ci\ 

(ci,...,cjv):||ch/'iH hcjvVjv||<l i=1 

N N 

> sup = a-v^k/,^)! 2 ) 1 / 2 . 

(c 1 ,...,c N ):\c 1 \ 2 + - + \c N \ 2 <A- 1 i=1 i=1 

□ 

Proposition 2.3. An M-coherent dictionary is (N, (l+M(N —l))~ l )-Bessel. 
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Proof. By Lemma 2.1 from [TJ for an M-coherent dictionary we have 

N N 

|| cM 2 <(l + M(N-l)) £ |q| 2 . 
t=i i=i 

Applying the above Lemma 12.11 we obtain the statement of the proposition. 

□ 

The following proposition is a direct corollary of Lemma 12.11 
Proposition 2.4. Let a dictionary D have the RIP(N,6): for any distinct 

1^1,..., IpN 

N N N 

(!-*)£ N 2 < || £ c^|| 2 < (1 + 6) h\ 2 - 

i=l i=l i=l 

Then V is (N, (1 + SY^-Bessel. 



3 Proofs 

Proof of Theorem 11.11 Let 

oo oo 

f = ^2c j g j , jj GD, ^|c^|<l, |ci| > |c 2 | > . . . . (3.1) 

Every element of Ai (V) can be approximated arbitrarily well by elements of 
the form (13. ip . It will be clear from the below argument that it is sufficient 
to consider elements / of the form (13.11) . Suppose v is such that \c u \ > a/s > 
|c„+i|, where a = Then the above assumption on the sequence {cj} 

implies that v < \_s/a\ and |c s+ i| < 1/s. We claim that elements gi, . . . ,g u 
will be chosen among (pi,...,(p B at the first iteration. Indeed, for j e [1, u] 
we have 

oo 

\(f,9j)\ > \cj \ - |c fc | > a/s -M{1 -a/s) >a/s-M. 

For all g distinct from g\ , . . . , g s we have 

\{f,g)\<M + l/s. 
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Our assumption s < 1/(2M) implies that M + 1/s < t(a/s — M). Thus, we 
do not pick any of g G T> distinct from g\ , . . . , g s until we have chosen all 

9i, ■••,9v 
Denote 

V oo 

f ■= f ~^ c 39j= J2 c J g y 

3=1 j=v+l 

It is clear from the above argument that 

fi = f- P Hl (f) = f~ PnAf) = f~ G{{f). (3.2) 
Define a new dictionary 

Let Ji = [(I -l)s + v+l,ls + v]. We write 



/'=EE^» = Eii*iiiiSr. < 3 - 3 > 



i=ijeJi i=i 11 ^' 



where ipi = Yljej t c j9j- Apparently |^| G V s for I > 1. Equation (I3.3j) 
implies that 



/'G^x(P s ,^ 



oo 



By Lemma 2.1 from [T] we bound 



(1 - Ms) c ) < WW < (1 + Ms) £ S 2 . (3.4) 



Then we obtain 

oo oo 

x:ii^ii<(i+m S ) i/2 de^ i/2 - ( 3 - s ) 

z=i «=i jeJ ; 

Since the sequence {cj} has the property 

oo 

\cy+x\ > \c u+2 \ > ^ |cj| < 1, |c^ + i| < a/s (3.6) 

j=H-l 



we may apply the simple inequality, 



CC c ?) 1/2 < sl/ %-i)^ + ii, 



so that we bound the sum in the right side of ( 13. 5 j) 

oo oo 

E(E c ?) 1/2 < ^J2\c^ 1)s+V+1 \ 

1=1 jeJi 1=1 

oo 

< s^ia/s + Y^s- 1 Y, l9l)<(a + l)^ 1/2 . (3.7) 

1=2 jeJi-i 

Using the above inequality in (I3.5p . we obtain that 

^/^(^(a + l^). (3.8) 

Assume it := Oj/ij G ^4i(P s , 5), where 5 is an absolute constant. Then 
for any if) G T> s we have 

u := u - (u,if))ip G Ai(D,2B), 

since |(w,^>| = K£ 0^,^)1 < IHI E N < 5 - Alon S with O and (l3T8|) 
the above argument shows that 

fx e A(£> s , 2(3/2) 1/2 (a + l)s -1/2 ). (3.9) 

Consider the following quantity 

<? s := q s {fm-l) ■= sup ||Pff( s )(/ m -i)||, 

i£[l,s] 

where i^(s) := span(/ii, . . . , /i s ). It is clear that 

g s = sup max |</ m _i,V)l = SU P \{fm-i,9 s }\- 

H{s) W&H{s),\\ip\\<l g"eV s 

Let if) = J2i=i a ihi- Again by Lemma 2.1 from [TJ we bound 

s s 

(1 - Ms) a? < ll^f < (1 + Ms) ^ a 2 . (3.10) 

t=i i=i 
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Therefore, 

s s 

(1 + Ms)- 1 J2(fm-l, K) 2 < \\P H (s)(fm-l) f < (1 - Ms)' 1 ^</ m _i, ^) 2 . 
i=l i=l 

(3.11) 

Let p m := -Pff m (/m-i)- In order to relate q% to ||p m || 2 we begin with the fact 
that 



q 2 s < sup (1- Ms)' 1 ^ifm-i^i}'' 



»e[i,«] 1-1 



Consider an arbitrary set {ft,j}? =1 of distinct elements of the dictionary 
T>. Let V be a set of all indices i G [l,s] such that /ij = (pk(i), G 7 TO . 
Denote V := {k(i),i G V}. Then 

s 

^</ m _ 1 ,/ ii ) 2 = ^(/ m _ 1 ,/ k ) 2 + ^ {fm-uhf. (3.12) 

i=l ieV ie[l,s]\V 

From the definition of {<Pk}k£i m we get 

max \(f m -i,hi)\ <t-\ min \(f m -i,<Pk)\- (3.13) 

Using (I3.13P we continue A3. 12[) 

< ^2(fm-i,<Pk) 2 + t~ 2 (/ m _i,y?fc) 2 < iT 2 (/ m „i,v?fc) 2 . 



Therefore, 

g 2 < (1 - M S )-H- 2 £ ^> 2 < ] + MS [b, 112 

^ t 2 (l-M S ) 



This results in the following inequality 



i 2 (l-Ms) 2 



Thus we can interpret WSGA(s,t) as WGA(rt) with respect to the dic- 
tionary V s , where r = ^ f^^f J 2 ■ Using ( I3.9p . we get 

11/^11 = IKA^-ill < 6^( a+ i) s -^ 7(m _i ;rt) . 
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This completes the proof of Theorem 11.11 

Proof of Theorem [27J. For the from the definition of the WOSGAT, 
denote 

F m := span(v9 i; 2 e I m ). 
It is easy to i © F m . Therefore, 

fm — f — PH m (f) = fm-1 + G m -i(f) — P Hm {fm-l + Gm-l{f)) 
= fm-1 — -Pff m (/m-l)- 

Then the inclusion F m C if m implies 

||/m|| < ||/ m -l-PF m (/ m -l)||. (3.15) 

Using the notation p m : = Pp m (/ m „i), we continue 

II /m— 1|| || /"m— 1 Pm|| ||Pm|| 

and by (ETT5|) 

||/m|| 2 < ||/m-l|| 2 -|bm|| 2 - (3-16) 

We now prove a lower bound for ||p m ||. By our assumption that the 
dictionary is (N, /3)-Bessel we get 

n^ m (/ m -i)ii 2 >/3^i(/ m -i ; ^)r. 

Then, by the thresholding condition of the greedy step (1), we obtain 

||PF m (/ m -i)|| 2 >/3^ m ||/ m -i|| 4 . (3.17) 
Substituting this bound in (I3.16p . we get 

||/m|| 2 < ||/ m -i|| 2 (l-/3i 2 s m ||/ m _i|| 2 ). (3.18) 
We now apply the following lemma from [3]. 

Lemma 3.1. Let {a m }^ =0 be a sequence of nonnegative numbers satisfying 
the inequalities 

a < A a m < a m _i(l - A^a^/A), m = l,2, ... 

Then we have for each m 

m 

a m < ^(l + ^A 2 )- 1 . 

k=l 
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It gives us 

m 

\\f m \\<(i+pt 2 Y,^r 1/2 - 

□ 

Acknowledgements. We are grateful to Prof. Ming-Jun Lai for helpful 

discussions. This research was supported by the National Science Foundation 
Grant DMS-0906260. 



References 

[1] Donoho, D.L., Elad, M., Temlyakov, V.N.: On Lebesgue-type inequal- 
ities for greedy approximation. J. Approx. Theory. 147(2), 185-195 
(2007) 

[2] Kerkyacharian, G,. Mougeot, M., Picard, D., Tribouley, K.: Learn- 
ing out of leaders. Multiscale, Nonlinear and Adaptive Approximation, 
DOI 10.1007/978-3-642-03413-8-9, Springer- Verlag Berlin Heidelberg, 
pp. 295-324 (2009) 

[3] Liu, E., Temlyakov, V.N.: Orthogonal super greedy algo- 
rithm and its applicaitons in compressed sensing. Preprint (2010) 
|http : / / dsp . rice, edu/sites/ dsp . rice, edu / files / cs/ LiuTemly akov . pdf| 

[4] Temlyakov, V.N.: Weak greedy algorithms. Adv. Comput. Math. 12, 
213-227 (2000) 

[5] Temlyakov, V.N.: Greedy approximation. Acta Numerica. pp. 235-409 
(2008) 

[6] Jones, L.: A simple lemma on greedy approximation in Hilbert space and 
convergence rates for projection pursuit regression and neural network 
tranining. The Annals of Statistics. 20, 608-613 (1992) 

[7] Temlyakov, V.N.: Relaxation in greedy approximation. Constr. Approx. 
28 (2008), 1-25. 

[8] DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. 
Adv. Comput. Math. 5, 173-187 (1996) 



13 



[9] Temlyakov, V.N.: Greedy Algorithms and M-Term Approximation with 
Regard to Redundant Dictionaries. J. Approximation Theory, 98, 117- 
145 (1999) 

[10] Temlyakov, V.N.: Nonlinear methods of approximation. Found. Corn- 
put. Math., 3, 33-107 (2003) 



14 



